Measuring Contribution of HTML Features in Web Document Clustering
نویسندگان
چکیده
منابع مشابه
Measuring Contribution of HTML Features in Web Document Clustering
Documents in HTML format have many features to analyze, from the terms in special sections to the phrases that appear in the whole document. However, it is important to decide which feature contributes the most to separate documents according to classes. Given this information, it is possible not to include certain feature in the representation for the document, given that it is expensive to co...
متن کاملMeasuring Effectiveness of Text-Decorated HTML Tags in Web Document Clustering
Web document analysis, and its associated research, underpins much of what is referred to as web intelligence and the envisaged ‘semantic web’. A key issue in this field is how to encode a web document from the raft of potential document “features” without losing salient information. Current research almost always uses word-based feature vectors such as term frequency of specific words (TF) and...
متن کاملMulti-type Features Based Web Document Clustering
Clustering has been demonstrated as a feasible way to explore the contents of document collection and organize search engine results. For this task, many features of Web page, such as content, anchor text, URL, hyperlink etc, can be exploited and different results can be obtained. We expect to provide a unified and even better result for end users. Some work have studied how to use several type...
متن کاملHierarchical Fuzzy Clustering Semantics (HFCS) in Web Document for Discovering Latent Semantics
This paper discusses about the future of the World Wide Web development, called Semantic Web. Undoubtedly, Web service is one of the most important services on the Internet, which has had the greatest impact on the generalization of the Internet in human societies. Internet penetration has been an effective factor in growth of the volume of information on the Web. The massive growth of informat...
متن کاملWeb Document Clustering
Users of Web search engines are often forced to sift through the long ordered list of document “snippets” returned by the engines. The IR community has explored document clustering as an alternative method of organizing retrieval results, but clustering has yet to be deployed on the major search engines. The paper articulates the unique requirements of Web document clustering and reports on the...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: CLEI Electronic Journal
سال: 2008
ISSN: 0717-5000
DOI: 10.19153/cleiej.11.2.7